To work through this, clone the repository (an RStudio project) at https://github.com/eriqande/make-a-BGP-map to get all the necessary input files, etc. Then open up the RStudio project and run though Make-a-BGP-map-Notebook.Rmd.
This chronicles the steps taken whilst making a genoscape for Willow Flycatchers. The order in which we discuss building up this map is different from the order in whic we put layers down to actually make the map. Here, we will start with making the genoscape, which is actually the part that sits atop the whole map. But we do that because that is the part that will actually change from species to species, and we want everyone to be relatively fresh for understanding those mutable parts. Once the genoscape is dealt with, we will then talk about the next layer of country and state boundaries, and also coastline polygons. We do this, because it doesn’t take too long to plot these features, and using them is a good way to figure out the desired extent of your map and to decide upon a projection. Finally, we will talk about the bottom layer of the map, which is the raster with earthforms and shading from Natural Earth Data.
This work draws on a few functions that I have in packages that I have up on GitHub, namely:
tess3r. Note that you can’t use the default version of tess3r, you have to use my fork of it, which has some extra functionality.genoscapeRtoolsGet those packages like this:
remotes::install_github("eriqande/TESS3_encho_sen") # for special version of tess3r
remotes::install_github("eriqande/genoscapeRtools") # for Eric's genoscapeRtools
The rest of the packages you need can be downloaded from CRAN. If you don’t have them you should get them: raster, sf, fields, downloader, and tidyverse. The last one there gets ggplot2 and a number of other packages by Hadley Wickham.
You can get those like this:
install.packages(c("raster", "sf", "tidyverse", "fields". "downloader"))
library(raster) # important to load before tidyverse, otherwise it masks select()
library(tidyverse)
library(sf)
library(ggspatial)
The genoscape are the bright colors smeared across space that show where different genetically identfiable groups of birds reside on the breeding grounds. These genoscapes are stored as rasters, and transparency is used to indicate how much confidence one has in the genetic identification of individuals in different areas. These rasters are made by interpolating Q-values from a program like STRUCTURE or ADMIXTURE between individuals that were sampled in space.
We need a matrix of Q-values for individuals. We have one that we will read in as a tibble
Q_tibble <- read_table2("inputs/breeding-bird-Q-values.txt")
Q_tibble
Column id is the sample name and the rest are ancestry fractions to different clusters (named with three characters) estimated by STRUCTURE.
We also need to know where those individuals were sampled in space. We have that here:
LatLong_tibble <- read_tsv("inputs/breeding-bird-lat-longs.tsv")
LatLong_tibble
The Q-values correspond to different clusters, as shown above. We must specify the colors that we wish to assign to each of those clusters. We do that with a named vector like this:
cluster_colors <- c(
INW = "#984ea3",
EST = "#377eb8",
PNW = "#4daf4a",
SSW = "#ff7f00",
SCC = "#ffff33",
KER = "#e41a1c",
WMT = "#00ffff"
)
For fun, we can plot these to see what they look like:
enframe(cluster_colors) %>%
mutate(x = 1:n(),
y = 1:n()) %>%
ggplot(aes(x = x, y = y, fill = name)) +
geom_point(pch = 21, size = 12) +
scale_fill_manual(values = cluster_colors)
Finally, we need to have a GIS Shapefile that tells us the range of the breeding birds, so that genoscape can be clipped properly. We read this shapefile with the st_read() function from package sf.
breeding_range <- st_read("inputs/wifl-shapefiles-revised/WIFLrev.shp")
For grins, we can plot those polygons to see what they look like:
ggplot(breeding_range) +
geom_sf() +
theme_bw()
Within Eric’s fork of tess3r is a function called tess3Q_map_rasters. It takes input from the objects we have above, but it takes that input as matrices rather than data frames, etc. so there is a little finagling to be done.
We need to ensure that we have values for birds in the right order (a job for aleft_join), and we also have to make it a matrix with Longitude in the first column and Lat in the second.
long_lat_tibble <- Q_tibble %>%
select(id) %>%
left_join(LatLong_tibble, by = "id") %>%
select(Long, Lat)
long_lat_matrix <- long_lat_tibble %>%
as.matrix()
Pull off the names of individuals and make a matrix of it:
Q_matrix <- Q_tibble %>%
select(-id) %>%
as.matrix()
For this, we use the above variables in tess3r::tess3Q_map_rasters(). Note that use namespace addressing for this function rather than load the tess3r
genoscape_brick <- tess3r::tess3Q_map_rasters(
x = Q_matrix,
coord = long_lat_matrix,
map.polygon = breeding_range,
window = extent(breeding_range)[1:4],
resolution = c(300,300), # if you want more cells in your raster, set higher
# this next lines need to to be here, but don't do much...
col.palette = tess3r::CreatePalette(cluster_colors, length(cluster_colors)),
method = "map.max",
interpol = tess3r::FieldsKrigModel(10),
main = "Ancestry coefficients",
xlab = "Longitude",
ylab = "Latitude",
cex = .4
)
# after that, we need to add names of the clusters back onto this raster brick
names(genoscape_brick) <- names(Q_tibble)[-1]
That gives us a raster brick of Q-values associated with each cell in the raster, but those values are not always constrained between 0 and 1, so we have to massage them a little bit in the next section.
For this we use the function genoscapeRtools::qprob_rando_raster(). This sets the colors between clusters to something reasonable and scales things so that the color in each cluster is at maximum opacity at the highest value within the clusters, etc. See ?genoscapeRtools::qprob_rando_raster to learn about the scaling options, etc.
This will squash the raster brick with one element for each cluster down to a single RGBA raster brick.
genoscape_rgba <- genoscapeRtools::qprob_rando_raster(
TRB = genoscape_brick,
cols = cluster_colors,
alpha_scale = 2.0,
abs_thresh = 0.0,
alpha_exp = 1.55,
alpha_chop_max = 230
)
# at this point, we must be explicit about adding a projection to the raster.
# This adds the info for a regular lat-long projection
crs(genoscape_rgba) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
We can easily plot this with the function layer_spatial from the ggspatial package:
ggplot() +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw() +
coord_sf()
Note that if we wanted to add the actual sampling points to this (that are given in long_lat_tibble), we can use ggspatial’s geom_spatial_point() function.
ggplot() +
layer_spatial(genoscape_rgba) +
geom_spatial_point(data = long_lat_tibble, mapping = aes(x = Long, y = Lat)) +
theme_bw() +
coord_sf()
It would probably be better to jigger those points a little bit. That could be done by mutating each of them a random bit away.
Since we will be using the Natural Earth Data Set for the raster background of our map, we will also use the Natural Earth lines and polygons. The Natural Earth data set is an amazing, open-source resource for making beautiful maps. Check it out at naturalearthdata.com.
For coastlines, countries, and states/provinces, you will need to download three different shape files. We will be working with the highest resolution Natural Earth data sets which are the 10m versions. Download and unzip them to a directory within this project called ne_shapefiles. You can do that with R like this:
dir.create("ne_shapefiles", showWarnings = FALSE)
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_shapefiles")
We do the same for country boundaries, and then state/province boundaries, too. Note, in both cases we just want to get the boundary lines that are internal to the coast. Otherwise you get a think line around the coast that is not so great…
# country boundaries
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_boundary_lines_land.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_shapefiles")
# state and province boundaries
# country boundaries
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces_lines.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_shapefiles")
First read it in:
coastlines <- st_read("ne_shapefiles/ne_10m_coastline.shp")
Reading layer `ne_10m_coastline' from data source `/Users/eriq/Documents/git-repos/make-a-BGP-map/ne_shapefiles/ne_10m_coastline.shp' using driver `ESRI Shapefile'
Simple feature collection with 4133 features and 3 fields
geometry type: LINESTRING
dimension: XY
bbox: xmin: -180 ymin: -85.22194 xmax: 180 ymax: 83.6341
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
Now, let’s just crop out the part that we want. This is somewhat key: right here we will define the extent in lat-long of the region that we want to plot:
# note! it is important to put the elements of domain in this
# order, because the function faster::extent() is expecting thing
# to be in this order, and doesn't parse the names of the vectors
# the way sf::st_crop() does.
domain <- c(
xmin = -135,
xmax = -60,
ymin = 22,
ymax = 60
)
coast_cropped <- st_crop(coastlines, domain)
although coordinates are longitude/latitude, st_intersection assumes that they are planar
attribute variables are assumed to be spatially constant throughout all geometries
Then plot the cropped part:
ggplot(coast_cropped) +
geom_sf() +
coord_sf()
OK, that was pretty reasonable. Now crop all the lines to domain and plot them, and put the genoscape on top of that, too.
countries_cropped <- st_read("ne_shapefiles/ne_10m_admin_0_boundary_lines_land.shp") %>%
st_crop(domain)
Reading layer `ne_10m_admin_0_boundary_lines_land' from data source `/Users/eriq/Documents/git-repos/make-a-BGP-map/ne_shapefiles/ne_10m_admin_0_boundary_lines_land.shp' using driver `ESRI Shapefile'
Simple feature collection with 462 features and 18 fields
geometry type: MULTILINESTRING
dimension: XY
bbox: xmin: -141.0055 ymin: -55.12092 xmax: 140.9776 ymax: 70.07531
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
although coordinates are longitude/latitude, st_intersection assumes that they are planar
attribute variables are assumed to be spatially constant throughout all geometries
states_cropped <- st_read("ne_shapefiles/ne_10m_admin_1_states_provinces_lines.shp") %>%
st_crop(domain)
Reading layer `ne_10m_admin_1_states_provinces_lines' from data source `/Users/eriq/Documents/git-repos/make-a-BGP-map/ne_shapefiles/ne_10m_admin_1_states_provinces_lines.shp' using driver `ESRI Shapefile'
Simple feature collection with 10114 features and 21 fields
geometry type: MULTILINESTRING
dimension: XY
bbox: xmin: -178.1371 ymin: -49.25087 xmax: 178.4486 ymax: 81.12853
epsg (SRID): 4326
proj4string: +proj=longlat +datum=WGS84 +no_defs
although coordinates are longitude/latitude, st_intersection assumes that they are planar
attribute variables are assumed to be spatially constant throughout all geometries
Now, plot it all. Notice we do it by just adding layers.
mapg <- ggplot() +
geom_sf(data = coast_cropped) +
geom_sf(data = countries_cropped, fill = NA) +
geom_sf(data = states_cropped, fill = NA) +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw()
# now plot it under default lat-long projection
mapg +
coord_sf()
That is looking like it should, but the projection is pretty ugly. In the next section we will consider projection.
The new version of ggplot makes it super easy to project everything on the fly by passing the projection to coord_sf(). For things in North America, it seems that a Lambert conic project does a nice job of keeping British Columbia and Alaska from looking way too big. We can define such a projection with a string, like this:
lamproj <- "+proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-100 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
Now, let’s see how our developing map looks under such a projection. We just define lamproj as a coordinate reference system and pass it in to coord_sf():
mapg +
coord_sf(crs = st_crs(lamproj))
I would call that decidedly better.
Now, all that remains is to put all of this on top of a nicely tinted map that shows landforms and things. Note that once we have such a layer under the rest of this stuff, we will probably omit the coastline layer, since it gets a little dark where the coastline is highly dissected.
I am going to recommed that you get the hypsometrically tinted Natural Earth map with water bodies and rivers on it, and you might as well get the one that has some ocean basin coloring too. We will put it into a directory in the project called ne_rasters. Note, this expands to about half a gig of data.
dir.create("ne_rasters", showWarnings = FALSE)
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/raster/HYP_HR_SR_OB_DR.zip",
dest = tmpfile
)
trying URL 'https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/raster/HYP_HR_SR_OB_DR.zip'
Content type 'application/zip' length 404294054 bytes (385.6 MB)
==================================================
downloaded 385.6 MB
unzip(zipfile = tmpfile, exdir = "ne_rasters")
Reading a large raster in with the raster package does not take up much memory, because it leaves it on disk. So we will open a connection to the big, massive raster using the brick function from the raster package and then crop it:
hypso <- brick("ne_rasters/HYP_HR_SR_OB_DR/HYP_HR_SR_OB_DR.tif")
hypso_cropped <- crop(hypso, extent(domain))
Now, we just add hypso_cropped in with ggspatial::spatial_layer() below all of our other layers (dropping the coastlines), and we might make country and state lines thinner (and different sizes) and gray:
big_one <- ggplot() +
ggspatial::layer_spatial(hypso_cropped) +
geom_sf(data = countries_cropped, fill = NA, size = 0.4) +
geom_sf(data = states_cropped, fill = NA, size = 0.3) +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw() +
coord_sf(crs = st_crs(lamproj))
big_one
Seeing at sort of small on the screen does not really do justice to it. You can ggsave it in whatever size is desired.
I might update this shortly with strategies to cut out a rectangle from this so that it doesn’t have the conic appearance.
sessioninfo::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────────
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 3.6.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
class 7.3-15 2019-01-01 [2] CRAN (R 3.6.1)
classInt 0.4-2 2019-10-17 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
codetools 0.2-16 2018-12-24 [2] CRAN (R 3.6.1)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
DBI 1.0.0 2018-05-02 [1] CRAN (R 3.6.0)
digest 0.6.21 2019-09-20 [1] CRAN (R 3.6.0)
dotCall64 1.0-0 2018-07-30 [1] CRAN (R 3.6.0)
downloader 0.4 2015-07-09 [1] CRAN (R 3.6.0)
dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
e1071 1.7-2 2019-06-05 [1] CRAN (R 3.6.0)
fields 9.9 2019-10-13 [1] CRAN (R 3.6.0)
forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
genoscapeRtools 0.1.0 2019-11-10 [1] Github (eriqande/genoscapeRtools@19e0f33)
ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0)
ggspatial * 1.0.3 2018-12-14 [1] CRAN (R 3.6.0)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
haven 2.1.1 2019-07-04 [1] CRAN (R 3.6.0)
hms 0.5.2 2019-10-30 [1] CRAN (R 3.6.1)
httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0)
KernSmooth 2.23-15 2015-06-29 [2] CRAN (R 3.6.1)
knitr 1.25 2019-09-18 [1] CRAN (R 3.6.0)
labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.1)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0)
lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0)
lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
maps 3.3.0 2018-04-03 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [2] CRAN (R 3.6.1)
modelr 0.1.5 2019-08-08 [1] CRAN (R 3.6.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
nlme 3.1-140 2019-05-12 [2] CRAN (R 3.6.1)
packrat 0.5.0 2018-11-14 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
purrr * 0.3.3 2019-10-18 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
raster * 3.0-7 2019-09-24 [1] CRAN (R 3.6.0)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
RcppEigen 0.3.3.5.0 2018-11-24 [1] CRAN (R 3.6.0)
readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
rgdal 1.4-6 2019-10-01 [1] CRAN (R 3.6.0)
rlang 0.4.1 2019-10-24 [1] CRAN (R 3.6.1)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.0)
scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.0)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
sf * 0.8-0 2019-09-17 [1] CRAN (R 3.6.0)
sp * 1.3-1 2018-06-05 [1] CRAN (R 3.6.0)
spam 2.4-0 2019-11-02 [1] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
tess3r 1.1.0 2019-11-10 [1] Github (eriqande/TESS3_encho_sen@2a0ce6c)
tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0)
units 0.6-5 2019-10-08 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
xfun 0.10 2019-10-01 [1] CRAN (R 3.6.0)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
[1] /Users/eriq/Library/R/3.6/library
[2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library